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Abstract 

On the basis of empirical evidence from molecular dynamics simulations, molecular confor- 
mational space can be described by means of a partition of central conical regions characterized 
by the dominance relations between cartesian coordinates. This work presents a geometric and 
combinatorial description of this structure. 
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Introduction 



In previous works (Gabarro-Arpa and Revilla, 2000, Laboulais et al., 2002) it was put forward 
the idea that the three-dimensional structure of proteins could be encoded into binary sequences. 
For a molecule with N atoms in a given conformation the procedure employed consisted in 

- defining a procedure for enumerating the atoms, which gives an order relation. 



- as in the mesoscopic models of macromolecules atoms are represented by pointlike struc- 
tures, a 4-tuple determines a 3-simplex|^, since the atoms in the 4-tuples are ordered a given 
3-simplex can be left or right handed. Depending on the simplex handedness each 4-tuple 
is given a sign +/— , 

- the set of 4-tuples can also be ordered to become a sequence, from it and the signs associated 
to each 4-tuple a sign vector {+, — }^ can be constructed: the chirotope|^ which is the 
desired binary sequence. 

The chirotope defines an equivalence relation between conformations: two conformations 
belong to the same equivalence class if they have the same chirotope. This generates a geometrical 
structure in conformational space: a partition X into a set of regions (cells) whose points (3D- 
conformations) have all the same chirotope. 

The connected components of such equivalence classes are locally compatible with a central 
conical geometry: multiplying the 3 x N cartesian coordinates of a given conformation by an 
arbitrary positive factor does not change the chirotope, since under this transformation the 
handedness of a scaled 3-simplex remains unchanged. Thus, in conformational space the set of 
points lying on a half-line starting at the origin all belong to the same equivalence class. The 
term central means that the vertices of the cones are at the origin. In the following, if we talk 
about a partition without further qualifications, we mean a central partition. 

This simple result suggests that conformational space can be partitioned into a discrete set 
of conical cells, the structure of this partition is encoded by the graph of regions T(X), which 
has as vertices the set of cells of X and as edges the pairs of cells that are adjacent. 

Since the graph is connected, there is a graphical distance between cells as the length of the 
shortest path between the two representative vertices in the graph. The same distance between 
two equivalence classes can be defined as a Hamming distance: the number of different signs 
between the chirotopes of the two conformations. The latter definition was first employed in 
(Gabarro-Arpa and Revilla, 2000), were no geometrical interpretation in terms of space partition 
was attempted. 

In the two works cited above the Hamming distance was used to analyze clusters of confor- 
mations in molecular dynamics trajectories with measurably good results, in these studies when 
compared with the classical r.m.s. deviation measure (Kabsch, 1978) it was seen to perform bet- 
ter and to be more robust (Laboulais et al, 2002). This good performance can be qualitatively 
explained if the mesh that results from projecting the graph of regions onto the hypersurface 
where the system evolves, is sufficiently fine grained to give an accurate measure, at least in the 
range explored by molecular dynamics simulations. 

Thus it seems not unreasonble to give a description of conformational space based on a central 
partition of conical cells. However, working out the set of cells derived from the chirotope turns 

^ a three-dimensional (3D) polytope with four vertices, 
in this work bold faced words refer to topics that are more fully developed in (Rosen, 2000) and references 
therein. 



- forming the set of all ordered sets of four atoms 
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out not to be pratical, so in this paper we present a partition derived from a central hyperplane 
arrangement, where a set of non-coplanar hyperplanes passing through the origin divide the space 
in a number of conical regions^. 

A central hyperplane arrangement 

In what follows is a real affine space of N dimensions, and {ei}, with 1 < i < A^, are its 
unit vectors. We define the following set of vectors 

N = {nij = ei - ej , 1 < i < j < AT} (1) 

Notice that if u = (1, 1) and nij e N then u.njj = 0. 

Associated with this set there is a set of central non-coplanar hyperplanes 

Hijip) = {p e : niyp = 0} (2) 

each hyperplane divides into the positive and negative hemispaces 

nf.{p) = {peR^ : nij.p > 0} , nr.{p) = {p e : nij.p < 0} 

so the hyperplane arrangement determines a partition V of into a set a set of convex regions 
(cells), where each cell C e V can be characterized by an antisymmetric N x N sign matrix V, 
such that \i p e C then 



+ pent 

Vii = , Vij = <( pe Hii , y 



penr. 



V i < i (3) 
+ 



The geometrical meaning of the sign matrix can be easily deduced from the following example: 
let p"',p^ e be two points with coordinates 

p'^ = (1,2, 3,4,5,. ..,iV) 
/ = (1,2, 4, 3, 5,..., A) 

obviously = {Vfj = + , V i < j} is the sign matrix of p". Now p^ has the same matrix except 
that V34 = — . Thus, for any point p the sign matrix encodes the pairwise dominance relations 
between its coordinates 

{+ Pi< Pj 
Pi = Pj y i <j 
- Pi> Pj 

In the above example notice also that n34 = p^ — p"" (4). 

Let 7r(A) be the set of points in whose coordinates are the permutations of the sequence 
{1, 2, 3, 4, 5, A}, no two points in this set have the same V matrix, and since it encodes the 
complete set of dominance relations between coordinates, there is a one to one correspondence 
between 7r(A) and "P, making a total of A! cells in V. 

The graph of regions of the arrangement 

In order to study the graph of regions T(V), it is important to notice that V is the incidence 
matrix of an acyclic tournament (Moon, 1968). 

Tournaments are directed graphs such that between any two nodes there is always an arc (see 
example in fig. 1), if Vi and Vj are two nodes, Vij = + if the arc goes from i to j, we say that Vi 

In what follows the term cone means a region of space determined by a set of vectors in R'^ such that for 
any finite subset of vectors it also contains all their linear combinations with positive coefRcients. 
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dominates Vj] otherwise Vij = — and Vi is dominated by Vj. 

The acyclic^] quahfier is because there are no directed cycles in the graph (as can be seen 
in fig. 1). This is a particularity of the tournaments that characterize the cells of V: for any 
permutation there are always two nodes called the source and the sink respectively, the former 
dominates all other nodes and the latter is dominated by every node in the graph (nodes 3 and 
2 in fig. 1, respectively). Moreover it is a centrally symmetric hierarchical structure: 

• the graph that results from reversing all the arcs is also acyclic, 

• deleting a node always gives a subtournament. 

This tells us that each A^-dimensional cell in V has exactly — 1 neighbours, since there are 
exactly — 1 arcs in a tournament that can be reversed without creating a directed cycle. These 
are the arcs joining nodes whose score differs by 1 (see the legend of fig. 1). 

T{V) can be obtained by joining with a line segment the points in 7r(A^) that are in adja- 
cent cells, the result is the 1-skeleton of a convex polytope: the A^-permutohedron or njv_i 
(Schoute, 1911). 

The study of the faces of Hn-i is an essential part in our study of "P, since it allows confor- 
mations and groups of conformations to be accurately located within XIat-i. 

The faces of Hat-i 

Central to this construction is the duality between the faces of Iljv-i and the cells of V: 
fe-faces and cells of dimension — A: lie in orthogonal linear subspaces. The sign matrix of lower 
dimensional cells has zeros in the entries corresponding to hyperplanes that contain the cell, as 
defined in (3), this matrix can be represented by incomplete tournaments: these are digraphs 
where the arcs corresponding to the zero entries have been deleted (see fig. 2). 

Incomplete tournaments can be seen as patterns: we say that a given tournament matches a 
pattern if both graphs have the same order and if the pattern is a subgraph of the tournament. 

The simplest non-trivial faces in the hierarchy are the 1-faces: edges that join adjacent vertices 
(0-faces). As we have seen, adjacent vertices differ in that they exchange the value of two 
coordinates, say i and j, and the edge is parallel to the vector nij (4), which is perpendicular 
to the hyperplane T-L\y This hyperplane contains the (A^ — l)-dimensional boundary cell that 
separates the A^-dimensional cells of the vertices, accordingly its sign matrix has Vij = Vji = 0. 

The pattern of fig. 2a, where the arc between V2 and vq is missing, matches exactly two 
tournaments that represent the vertices of the edge segment, also the complement graphsf] of 
these vertices is a set of lower order tournaments that encode the vertices of a lower dimensional 
permutohedron. In our case we have two order 2 tournaments (one can be seen in fig. 3a), 
that represent the permutations of the sequence {56} : the associated permutohedron is a line 
segment. 

On the other extreme lets look at the (A^ — 2)-faces. Notice that for the hyperplane arrange- 
ment described above if we construct the vector 

u° = {u^ = 1 - A^ , uf = 1 , i / a , 1 < i < A^} (5) 
where 1 < a < A^; the set of vectors 

N° = {nij e N : 1 < i < j < AT , i / a , j / a} (6) 

and the subset of hyperplanes 

^ All along this work tournaments are implicitly assumed to be acyclic. 
Given a tournament T and a pattern P the complement graph is a graph with the edges that are in T but not 
in P and the vertices that are in those edges (see fig. 3). 
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= {Hijip) : riij.p = , riij e N° , pe R^} 

we have u^-ny = for all ny e N". This means that the vectors in are all in the {N — 1)- 
hyperplane u'^.p = 0, consequently 

• the hyperplanes in H"' all have a common intersection: a 2-hyperplane parallell to u", 

• the set of vertices in the A'^-cells adjacent to this 2-cell all lie in a (A^ — 2)-hyperplane: they 
are the vertices of a (AT — 2)-face, 

• as can be deduced from (5) and (6), the only arcs present in the pattern of this face are 
those that connect node a to the other nodes, 

• node a is either a source or a sink. 

For = 6 we have the example pattern of fig. 2b, fig. 3b shows the complement graph which 
is a A?^ = 5 tournament, the set of all complement graphs encodes the permutations of the set 
{12345}, and hence the corresponding face is a 114 polytope. This tells us that a total of 2N 
faces of IIjv-i arc njv-2 polytopes. 

Before proceeding further we must introduce the basic notion of product polytope. If P is 
a polytope in W and Q a polytope in then the product polytope P x Q is defined as the set of 
all vertices {x, y) e W^^^ such that x e is a vertex of P and ?/ e is a vertex of Q. Examples 
of product polytopes arc: the square which is the product of two segments (two polytopes of 
dimension 1). The cube which is the product of a square by a segment, more generally the 
prisms, which are the product of a polygon (or polytope) by a segment. 

The example pattern from fig. 2c encodes a product polytope: the set of compatible sub- 
tournaments formed by vertices wi, ^3, V4, and v^, encode a lis, and are independent from the 
subtournament formed by V2 and vq which encodes a segment (or Hi). Thus there arc N x —1) 
{N — 2)-dimensional faces which are prisms joining two IlAf-s from adjacent IIn-2 faces. 

It can be easily seen that all the faces from this polytope are either permutohedrons or 
products of permutohedrons, for instance the polytope encoded by the pattern of fig. 2d is a 
Hi X Hi X Hi, that is: a cube. 

Notice that for the product of permutohedrons the complement graphs (see figs. 3c and 3d) 
are not connected. 

The face lattice of IIjv 1 

The differences among incomplete tournaments, when we disregard the identity of the nodes, 
arise from the topology of the graph: number of edges and nodes, and the connectivity. We 
can define an operation on patterns that consists in renunbering the nodes so that the score 
never decreases upon increasing the node number. Renumbered patterns are stripped from the 
complications that arise from permuting equivalent nodes, a classification of these objects based 
on topological differences, is far more simple while keeping their essential characteristics, it results 
in a comprehensive synthetic view of the face arrangement. Introducing the permutations between 
equivalent nodes is an unnecesary complication that can always be worked out in a later stage. 

The set of equivalence classes obtained upon renumbering is isomorph to the set C of parti- 
tions of the sequence {1,2,3, ...,N} into subsets of consecutive integers. The correspondence is 
established as follows 

• for each incomplete tournament form a sequence with the partial scores arranged in ascend- 
ing order 

• divide this sequence into subsets of identical partial scores 
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• replace each element in a subsequence of identical scores by the the corresponding node 
number in the renumbered sequence. 

For instance from the graph of fig. 2b we form the sequence 111116, whose corresponding partition 
is (12345)6. Likewise, for figs. 2a, 2c and 2d the sequences 123455, 111155 and 113355, give the 
partitions 1234(56), (1234)(56) and (12)(34)(56) respectively. These partitions represent classes 
of polytopes that are combinatorially equivalent, there are A^!/(ni! x 712! x ...) elements in each 
class, rii being the number of elements in each subset. Notice that (56), for instance, represents 
the set of permutations of the sequence {56} 

There is a partial order in the set of partitions, it is based on containment: we say that a 
partition set x is contained in y (x C y), if each subset in x is either identical to a subset of y or it is 
a subset of some subset in y. Thus for the above example: 1234(56) C (12)(34)(56) C (1234)(56), 
also (1234)56 C (1234)(56) and (1234)56 C (12345)6, but (1234)(56) ^ (12345)6. 

It can be shown that the partially ordered set (poset) C thus defined is a lattice, that is: 
for all pairs x , y e £. there is a least upper bound and a greatest lower bound. 

The lattice poset C for = 6 is represented in fig. 4, each element represents a class of faces 
of lis, they are arranged in five rows, the faces in a row have the same dimension which increases 
from in the bottom row to 5 at the top. There are {^^^) (0 < d < N), elements in each row. 

It should be noticed from fig. 4 the hierarchical structure of C: each intervalf*] between a given 
element in the lattice and the minimal element 123456 is also a lattice. Which is an expected 
result: the face lattice of any face is in the face lattice. 

A partition of conformational space 

The partition discussed in the previous sections is based on the dominance relations among 
the coordinates of points in an A^-dimensional space, in conformational space the coordinates 
of each point are the coordinates of a set of N points in 3D cartesian space, as 3D cartesian 
coordinates are independent of each other it would make little sense to translate automatically 
the partition described above to a 3 x A^-dimensional space, instead we propose the partition 

which is the union of three separate partitions: Vx, Vy and Vz-, that encode the dominance 
relations among the x, y and z coordinates of the set of points respectively. Vx, for instance, is 
generated by the set of hyperplanes 

7i^.(p) = {p e M^xiv . ^x^ p ^ Q| 

with a set of normal vectors defined as 

= {nfj = e? - Cj? , 1 < i < j < N} 

where the are the unit vectors in M^^^ of the x coordinates of the 3D points. 

M^^^ can be seen as a product space x M.^ x M^, with each factor harboring the x, y 
and z coordinates of the set of points. Thus, as the dual polytope of Vx, for instance, is njv_i, 
obviously the dual of V^ will be H^^^^ = Hn-i x Htv-i x IIat.!, its face poset can be worked 
out from the observation that II^_^ is a (3A^ — 3)-face of IIsAr.i. See for example the symmetric 
class of faces (12)(34)(56) in fig. 4, the poset of nf is the interval 123456 - (12)(34)(56). 

Now the question that arises is: how well do 3D point sets arising from the vertices of 
II^_2 relate to the actual conformations of macromolecules ? 

An alternative representation of permutations is as 0/1 matrices, these are objects whose 
only entries are Os and Is with the entry 1 occuring exactly once in each column. As an example 
to the permutation encoded by the tournament of fig. 1 it corresponds the 0/1 matrix 

** An interval is a subposet which contains aU elements z such that x C z Ci y. 
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likewise the coordinates of a vertex in n^_j^ can be encoded by a three-dimensional 0/1 matrix, 
which can be regarded as a cubic lattice with only one site occupied per row in every dimension. 
Analogously, we can imagine in 3D cartesian space an A^-point set embedded in a cubic lattice 
with cell spacing of 1 spanning a rectangle between 1 and in every dimension with the points 
located at the intersections such that there is only one point in every row in any dimension. 

We can compare in fig. 5 the 3D stereoviews of the HIV-1 integrase catalytic core Cq, skeleton 
(fig. 5a)|^, with the 3D representation, fig. 5b, of the corresponding nfg2 vertex within the 

cell. Altough fig. 5b appears to be somewhat deformed with respect to fig. 5a, all the 
characteristic folding patterns: a-helices, /5-sheets, turns ... appear to be conserved. 

This means that a lot of the 3D structure is encoded by the set of dominance relations among 
the cartesian coordinates of individual atoms. 

Conclusion 

Most of the time conformational space is referenced as an abstract paradigm too complex to 
be understood. The aim of this work is to show that the geometry of conformational space is not 
beyond the reach of mathematical intuition: with the help of adequate mathematical structures 
its sheer complexity can be brought to tractable dimensions, and it can be done with existing 
and well understood mathematical tools. 

The model developped here offers a number of interesting possibilities 

• the structural diversity of a macromolecule can be explored by means of combinatorial 



• the classification of conformations can give a catalog of structures 

• graphical paths can be used to determine and explore the paths between any two confor- 
mations 

• its hierarchical structure makes it modular 

There are shortcomings too: the present model shows a loss of precision in the 3D-structures 
obtained; but this should not be a major problem: 

• precision can be recovered with the help of ad hoc methods. Optimization of structures 
within a cell should not be difficult 

• there is no limit to the refinements that can be introduced into this basic model, in particular 
it should not be hard to build smaller cells, or to cut the existing ones into finer slices. 

The possibilities offered by the model will be the subject of the forthcoming works. 



residues 50-212 of the integrase (Maignan et al, 1998). 



patterns 
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Legends of Figures 



Figure 1 

= 6 tournament corresponding to the sign matrix 
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The score of a node is the number of nodes it dominates plus 1 (in order to estabhsh a 
correspondence with permutations). It is annotated above each node in the figure. 

Figure 2 

Example incomplete tournaments for N = 6 matching the tournament of fig. 1. Their 
respective sign matrices are 
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Complement graphs of the tournament in fig. 1 with respect to the patterns in fig. 2. 
Figure 4 

Poset C of the partitions of the sequence (123456) into subsets of consecutive integers. The 
bold letters above some partitions refer to the incomplete tournaments in fig. 2. 

Figure 5 

a) Stereo drawing of the HIV-1 integrase catalytic core skeleton (residues 50-212 of the 
integrase, Maignan et al, 1998). 



b) Stereo drawing of the related vertex in n^g2 . 
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